Journal of Theoretical and Applied Information Technology 
15" May 2024. Vol.102. No 9 
© Little Lion Scientific 


SATIT 


ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 


NEUMANN STACKED BILATERAL DEEP LEARNING 
BASED BIG SENTIMENT DATA ANALYTICS 


M ANOOP!, K SUTHA”", UMA SHANKARI SRINIVASAN’, M BALAMURUGAN*, 
V ANITHAS, J EMERSON RAJA® 


'2 Assistant Professor, Department of Computer Science and Applications (B.Sc CS) 
Faculty of Science and Humanities, SRM Institute of Science and Technology, 
Ramapuram Campus, Chennai, India 
Associate Professor, Department of Computer Science and Applications (BCA) 
Faculty of Science and Humanities, SRM Institute of Science and Technology 
Ramapuram Campus, Chennai, India 
‘Associate Professor, Department of Computer Science and Engineering 
Sri Sairam Engineering College, Chennai, India 
Professor, Department of Computer Science and Engineering 
Panimalar Engineering College, Chennai, India 
Assistant Professor, Faculty of Engineering and Technology 
Multimedia University, Melaka 75450, Malaysia. 


Email: 'profanoopcs@rediffmail.com, *ksuthal986@gmail.com, *umabalajees@gmail.com, 
“balamurugan.cse@sairam.edu.in, "annemoses2020@gmail.com, °emerson.raja@mmu.edu.my 


*Corresponding Author: ksuthal986@gmail.com 


ABSTRACT 


Sentiment analysis extracts information from several text sources like, blogs, reviews, news, and so on. The 
purpose of sentiment analysis on big data is to classify emotions or opinions into variegated sentiments. 
Conventional deep learning methods have been developed to classify the tweets. However, longer sentiment 
analysis time was considered. To address the issue, Neumann Mutual Informative and Stacked Bilateral Deep 
Learning (NMI-SBDL) for sentiment investigation is proposed to research products or services before 
making a purchase. First, through the tweets obtained from the Sentiment140 dataset, the Knowledge 
Sentimental Graph is constructed. Second, computationally-efficient dimensionality reduced tweets are 
generated by the Neumann Mutual Information-based Feature selection algorithm. Finally, the Stacked 
Bilateral LSTM-based model is utilized for classifying the tweet polarity. With this robust sentiment analysis 
is made by the Twitter Application Programming Interface (API) with higher accuracy and lesser 
computation time. Experimental assessment of the proposed NMI-SBDL and existing methods are carried 
out with different factors using Python libraries. The results of NMI-SBDL provided for improving the 
sentiment analysis accuracy, precision, recall and lesser time by 13%, 6%, 6%, and 23% than the existing 
approaches. The paper concludes with accurate and robust sentiment analysis for big data. 

Keywords: Big Data, Sentiment Analysis, Neumann Mutual Information, Feature Selection, Stacked 

Bilateral, Long Short-Term Memory 


1. INTRODUCTION gored professional and even social broadcasting 
in this day and age. Due to the swift evolution of 
social mass media, the whole world can direct 
their emotional state and opinions via internet. 
Hence, the analysis of sentiment plays a crucial 
role in understanding the perspectives of 
consumers or reviewers. Furthermore, sentiment 
analysis serves as a vital tool for examining 
collective emotions within a community. 


Sentiment investigation is an activity 
that analyzes the Feeling, Mood, Affect, 
Emotional state and Temperament of public from 
written language. In the big data period, having a 
significant sentiment investigation mechanism is 
requisite in several facets, specifically learning 
emotions. The impact of sentiment study has 
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A fusion deep culture method that 
integrated the advantages of classification model 
and Transformer model even though eliminating 
the disadvantages of system model, called, 
robustly optimized Bidirectional Encoder 
Representations from Transformers BERT 
approach (RoBERTa) and Long Short-Term 
Memory (LSTM) (RoBERTa-LSTM) was 
proposed in [1]. The integrated method 
RoBERTa-LSTM was proposed for sentiment 
study. The robustly optimized BERT condensed 
the arguments into a densely packed, meaningful 
word embedding space. The Long Short-Term 
Memory model on the other hand acquired the 
long-distance contextual semantics in an efficient 
manner. 


With this integrated method, 
improvement was originated to be observed in 
relations of accuracy, precision, recall and F1- 
score. In spite of improvements observed in 
relations of Correctness rate, Exactness, 
Sensitivity and balanced precision with the 
incidence of different types of data deeper and 
more tweet patterns cannot be formed. To address 
this aspect, NMI-SBDL uses Neumann Mutual 
Information-based Feature selection. These 
tweets can be utilized via Knowledge Sentimental 
Graph for deeply extracting and discovering 
extensive and new tweet patterns in a 
computationally efficient manner (i.e., time and 
accuracy). With the graph, different types of 
tweets are associated that in turn support richer 
data services than word embedding space. 


Sentiment investigation for 
demonetization twitters employing heuristic deep 
neural network (SenDemonNet) was proposed in 
[2]. The main objective here remained in 
apprehending the public view on currently 
deployed demonetization plan utilizing 
SenDemonNet. First, tweet preprocessing was 
performed for cleaning text data. Second, with the 
processed tweets, feature extraction was utilizing 
Bag of n-grams, TF-IDF, and the word2vec 
methodology. Finally, classification was 
performed using the fusion Forest-Whale 
Optimization Algorithm (F-WOA) with the 
objective of improving the classification outcome 
results, therefore reaching the maximum accuracy 
rate. 


Though improvements being found in 
terms of accuracy, in domains where all time 
instances of the input sequence are available, with 


an optimization mechanism the performance of 
information retrieval system cannot be handled 
for big data. To address on this issue, Stacked 
Bilateral LSTM-based Sentiment analysis is 
employed in NMI-SBDL for handling the big data 
(Sentiment 140 dataset) for classifying the tweets. 
Dimensionality reduced tweets as input is utilized 
in the Stacked Bilateral mechanism. The 
hyperbolic tangent activation function is 
employed for measuring the tweets. The Twitter 
data were analyzed both in the forward direction 
and backward direction for enhancing the 
performance of information retrieval in relation 
with higher precision and recall. 


A cross convolutional neural network- 
long short-term memory (CNN-LSTM) model 
was planned in [3] for sentiment analysis. The 
CNN-LSTM method was performed using 
dropout, max pooling, and group normalization to 
obtain accurate sentiment analysis results. 
However, it failed to enhance the accuracy. To 
address the issue, an integration of Convolutional 
Neural Networks (CNN) and Bidirectional Long 
Short-Term Memory (BiLSTM)copies was 
presented in [4] with the purpose of performing 
opinion analysis in case of long texts. With this 
combination model resulted in an improvement of 
accuracy. However, the fake information was not 
detected in Twitter. 


To address on this aspect, movement of 
information has switched to numerous various 
media overtime. In an epoch of digitization, 
information and events globally are transmitted 
predominantly via online social media (OSN) 
like, Instagram, Twitter, and Facebook and so on. 
Though transmission of information is said to take 
place globally, it causes an enormous menace of 
information falsification being shared by some 
people, therefore resulting in disruption and 
panic. 


In [5], several classification methods 
were designed for classifying sentiment dataset. 
Moreover, Random Forest was applied that in turn 
improved the accuracy rate. The analysis of this 
sentiment data has massive potentiality in 
transposing the mode we slog but inherent 
statistics extraction is still found to be demanding. 
A modified Convolutional Neural Network 
(CNN) for analyzing Twitter facts was presented 
in [6] to extract features. By fine tuning the 
network resulted in the improvement of 
classification accuracy. However, with the 
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imbalance nature of dataset would result in either 
overfitting or under fitting. To address on this 
aspect, a hybrid method integrating Support 
Vector Machine (SVM) algorithm and Particle 
Swarm Optimization (PSO) stood designed in [7]. 
With this integration pattern imbalanced data 
issue was addressed. However, the social network 
reliability was not measured. 


To address this aspect, the interference 
of counterfeit news contributors and bots results 
in proliferating publicity statistics as well as 
delicate content via network has stimulated 
exploration to evaluate social network reliability 
in an automatic manner by employing Artificial 
Intelligence (AI). In [8], the multilingual model 
for both identification in Twitter using DL 
methods was proposed for measuring the twitter 
account credibility. But it failed to consider 
different applications. To address the issue, a 
survey on sentiment classification employing DL 
was investigated in [9] along various evaluation 
measures were also conducted. Yet another work 
on general domain tweets was analyzed in [10]. 
However, the false positive rate was not 
measured. 


To overcome the problem, a Knowledge 
Sentimental Graph is presented in the proposed 
NMI-SBDL to explicitly provide a User-Tweet 
pair. Additionally, instead of only selecting the 
essential tweets dimensionality reduction factor is 
considered with minimum time that in turn 
eliminates the edges. Therefore, reducing the 
graphs, we synergistically combine _ the 
dimensionality reduced tweets with anchor points 
for analyzing the polarity to obtain richer feature 
representations and significantly enhance the 
sentiment analysis performance, accuracy, and 
minimize the false positive rate. 


1.1 Contributions 


The principal contributions of this paper include: 
e To present a detailed description of the 
Neumann Mutual Informative and 
Stacked Bilateral Deep Learning (NMI- 
SBDL) method that performs sentiment 

analysis with respect to a query term. 

e To design an_ algorithm for 
implementing feature selection by 
employing Neumann Mutual 
Information-based Feature selection 
algorithm for discarding irrelevant 
tweets with dimensionality reduced 


computationally efficient relevant tweets 
associated with sentiment analysis that 
can be utilized as an input to classifiers. 

e To offer a precise sentiment analysis 
classification typical to classify tweet 
sentiment employing Stacked Bilateral 
LSTM-based Sentiment analysis 
algorithm. 

e Experiments are conducted on_ the 
benchmark Sentiment140 dataset to 
present judgment with the predictable 
and state-of-the-art sentiment analysis 
method. The results indicate that the 
proposed approach is resilient and 
performs competitively in terms of time, 
accuracy, precision, and recall. 

1.2 Outlines 

The remainder of this paper is structured 

as follows. Section 2 initially provides the 

connected work in the domain of sentiment 
analysis. Section 3 elaborates on the proposed 

Neumann Mutual Informative and Stacked 

Bilateral Deep Learning (NMI-SBDL) for 

sentiment analysis in detail. Section 4 presents the 

experimental settings for designing the NMI- 

SBDL method in specific. Section 5 grants the 

evaluation metrics and debate in expound. 

Finally, Section 6 achieves the paper. 


2. RELATED WORKS 


A popular social networking site Twitter 
tweets each second about numerous topics 
concerning, society, politics, sports, 
entertainment, and numerous more has received 
the attention of the research community. Keeping 
an eye on user postings permits one to apprehend 
the news that is happening globally and also 
assists in analyzing people’s opinions to a greater 
extent. 


A holistic approach for analyzing 
fluctuation concerning public opinion was 
presented in [11]. Yet another in-depth analysis of 
sentiment via public opinions and emotions was 
investigated in [12]. Unsupervised machine 
learning algorithms were employed for extracting 
tweets. An ensemble voting classifier was 
introduced for forecasting the retweetability of the 
posted tweets. But the time was higher. 
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On several social media sites, students 
debate and measure their day-to-day experiences 
in an unofficial and unintended manner. In this 
context, the paths taken by students provide 
significant implicit knowledge and offer a 
comprehensive and novel perspective for 
researchers and experts to understand students' 
behaviors beyond the classroom setting. An 
arrangement to combine one and the other 
qualitative study and large-scale facts mining 
algorithms was proposed in [13] for analyzing 
social data. However, the big data sentiment 
analysis challenges were not considered. 


In order to address the issue, the 
implementation of judgment mining and 
sentiment scrutiny (OMSA) in the epoch of big 
data has been utilized as a convenient method in 
classifying the view into distinct sentiments and 
to be more specific measuring the public mood. A 
holistic systematic literature review to focus on 
both the technical and non-technical 
characteristics of OMSA was discussed in detail 
in [14]. 


The evolution of — information 
technologies has provided new insights into 
intelligence via human-centric meaning and that 
can be as straightforward as stated by review or 
questionnaire. The speedy magnification rate of 
such large data produces several origins of 
subjective information. Sentiment analysis has 
found a place as an active topic as far as 
information retrieval is concerned. 


The correlative acceptance of numerous 
Hashtags and the field possessing maximum share 
voice were analyzed in [15] using Jaccard 
similarity. With this mechanism, the accuracy rate 
was said to be improved. However, on 
convoluted training data, hybrid methods may 
minimize the sentiment mistakes. To address the 
issue, the trustability of different hybrid methods 
on heterogeneous datasets was provided in [16]. 
Beyond domains and datasets, hybrid methods 
were compared in analogous to single methods. 
With this hybrid method design accuracy was 
found to be improved. 


In [17] the application of DL techniques 
for social media analytics was investigated in 
depth. But the accuracy was not focused. To 
overcome the issue, the Q-learning technique was 
applied in [18] for predicting Bitcoin. Simulations 
showed the tweets posted by users had an impact 


on future prices, therefore reducing spending time 
and CPU consumption. Nevertheless, it failed to 
consider the recall. 


A methodical literature analysis on 
document-based sentiment scrutiny using DL was 
designed in [19]. But it failed to consider 
accuracy. The ensemble deep culture model was 
proposed in [20] to concentrate on _ the 
classification accuracy involved in the process of 
social media sentiment analysis. However, the 
features are not extracted with higher 
classification performance. In [21] Covid-19 
Twitter was analyzed using deep learning. Here, 
classification between positive, negative, and 
neutral tweets was performed to address the 
precision factor. Yet another reasonable analysis 
of deep learning algorithms in predicting 
products’ influential factors was analyzed in [22]. 
However, the classification performance was not 
improved. 


Over the past few years, Gated Recurrent 
Neural Networks have found their usage in 
classifying the sentiment owing to their 
potentiality to safeguard semantics over a period 
of time. Despite their preservation, negation and 
intensification using recurrent architecture were 
found to be a demanding issue. In [23], 
sentimental relation examination using a gated 
recurring neural network was proposed with the 
purpose of capturing sentimental relations for 
higher classification performance. Using natural 
language dealing out sentiment analysis was 
performed in [24]. But it failed to consider large 
datasets with less time. A majority voting process 
was applied in [25] for twitter sentiment analysis 
employing cooperative binary cluster model. 
However, the preprocessing was not considered. 


2.1 Problem Statement 


Recent developments in sentiment 
analysis have led to the growth of the promotion 
and selling of numerous products and services via 
the internet using new deep learning procedures. 
By employing deep learning-based techniques, a 
classification algorithm is trained with the 
assistance of distinct features in tweets that can 
differentiate between worthiness of products and 
therefore contribute to business development on 
the overall economy and making wise decision 
during purchase for consumers. These features or 
tweets obtained from the Sentiment140 dataset 
are extracted and analysis is made for customer 
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satisfaction. The existing deep learning-based 
methods [1] [2] extracted features with higher 
accuracy but the time involved in case of big data 
was not focused. Also, prediction characteristics 
of recall and precision were not concentrated in 
state-of-the-art methods, [3] [4] therefore 
thousands of fake products or services are 
mushrooming every day. Therefore, there is a 
requirement to design a significant method for 
analyzing sentiments in a precise and accurate 
manner and also for discarding negative tweets. 


2.2. Proposed Solution 


To solve above said problem, this paper 
presents a deep learning-based Neumann Mutual 
Informative and Stacked Bilateral Deep Learning 
(NMI-SBDL) for sentiment analysis that selects 
computationally efficient and dimensionality 
reduced tweets and enhances the precision and 
recall rate. The paper boon a model is to detect 
essential tweets with the aid of tweet information 
present in the Sentimentl40 dataset. The 
proposed method selects the tweets and analyzes 
them to detect whether the given tweets are 
essential or not via a robust classification model 
in determining the product worthiness. Also, the 
accuracy and sentiment analysis performance are 
improved and time is minimized. 


3. PROPOSED NEUMANN MUTUAL 
INFORMATIVE AND STACKED 
BILATERAL DEEP LEARNING FOR 
SENTIMENT ANALYSIS 


Sentiment Big Data analytics is referred 
to as the automated interpretation and 
classification of tweet polarities (i.e., neutral, 
positive, or negative) from social media posts or 
huge amount of incoming data. The objective that 
Sentiment Big Data analytics attempts to achieve 
is to scrutinize people’s point of view in a way 
that it can assist the organizations develop. It 
concentrates besides polarity (ie., neutral, 
positive, negative) but also used to detect 
sentiments. In this work, a method called, 
Neumann Mutual Informative and Stacked 
Bilateral Deep Learning (NMI-SBDL) for 
sentiment analysis is proposed to expand accuracy 
in a timely manner to identify sentiment polarity. 
The intricate description of the NMI-SBDL 
method is provided followed by data collection 
and problem formulation. 


3.1 Data Collection 


The efficiency of the planned method is 
measured using Sentiment140 dataset extracted as 
ofhttps://www.kaggle.com/kazanova/sentiment 
140. The tweets are gathered to evaluate the tweet 
polarity i.e., negative, neutral, or positive on 
tweets extracted by utilizing the twitter 
Application Programming Interfaces (APIs). The 
Sentiment140 dataset is acquired by carrying out 
the following steps: 


e §=©Twitter Search Plan of action: All user 
tweets were extracted and stockpiled 
employing APIs. 

e Hashtags selection: the hashtags that are 
related to the selected user tweet events 
are provided in Table 1. 

Tweets collection: User tweets are stockpiled 
employing the tabulated hashtags. The tweets 
collected and extracted from distinct users based 
on Tweet ID are straightforwardly made 
accessible for further processing. The structure of 
the dataset is presented in Table 1. 


Table 1: Structure of Sentiment140 data collection 


1,600,000 tweets 


0 — negative; 2 — neutral, 4 
— positive 


Tweet 
annotation 


6 fields 


@nationwide class no, it’s 
not behaving at all, ’m 
mad, why am I here? 
because I can’t see you all 


1.1 Issue Articulation 


In this segment, we initially express the job of 
the proposed method in a mathematical 
manner by constructing a Knowledge 
Sentimental Graph (KSG). The Knowledge 
Sentimental Graph interlinks data that are in 
the form of unstructured in to a meaningful 
manner. The KSG is a triple ‘(E, R, F)’ where 
‘E = {e1, 2, ...,€m}’ represents the entity set, 
‘ R={1H,%, 57%} represents _ the 
relationship set and finally ‘F’ denotes the 
relationship between entities respectively. Let 
us consider we have a User-Tweet pair 
‘(U,T)’, where ‘U = {U,, U3, ..., Um, }’ means 
the ‘ m users tweeted and ‘ T= 
{T,,T>,.-.,T,}’ means the ‘T’ tweets that has 
been made by the corresponding users.Then, 
the User-Tweet pair ‘(U,T)’ here represents 
the entity set, the relationship between User- 
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Tweet are denoted relationship set and finally, 
the relationship between these two forms the 
fact. The objective of the proposed method 
remains in predicting tweet polarity ‘y € 
{0,2,4}’, where ‘0’, ‘2’ and ‘4’ means the 
negative, neutral and positive sentiment 
polarities, separately. The development of 
KSG relies on its entities (alternatively called 
nodes) and their interactions with other 
elements, represented in the form of a 
diagram. Every entity (referred to as nodes or 
users) has the capability to exchange 
information with other entities (nodes or 
users). Figure 1 shows simple KSG structure 


where two nodes or users denote distinct 
entities. 


Edge relationship between ‘User,’ 


‘User; tweet 


tweet and ‘U SeT;’ tweet 


‘U. Ser; > tweet 


Figure 1: KSG structure with two entities and its relationship 


As shown in the above figure, there is an 
association among these dual nodes or users that 
signifies their rapport. The primary node or user 
‘i’ denotes the subject whereas the second node or 
user ‘j’ denotes the object, and their relationship 
(1.e., between user ‘i’ and user ‘j’) is referred to 
as the predicate. 


1.2 Pre-Processing 


The pre-processing of the fresh tweets 
extracted by using the APIs _ necessitates 
elimination of characters that do not assist in 
detecting Sentiment. Some of the unwanted 
characters range from HTML characters to special 
character “@”,URL, and hashtags, case sensitive, 
long words elimination, stop word elimination. 

1.3 Neumann Mutual Information-based 

Feature selection 


With the processed tweets essential 
features or tweets have to be selected by reducing 
the dimensionality so that accuracy can be 


improved with minimum error rate. In this work, 
the Neumann Mutual Information-based Feature 
selection norms for Knowledge Sentimental 
Graph (KSG) graphs are utilized. This criterion is 
applied after KSG construction and preprocessing 
for the tweets in the training process. With the 
application of dimensionality reduction 
eliminates the edges, therefore reducing the 
graphs size. Figure 2 shows the construction of 
Neumann Mutual Information-based Feature 
selection model. 
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Sentiment140 
Dataset 


Sentiment140 dataset providing as input, initially, 
input vector is formulated (i.e., user-tweet pair). 
Next, with the input vector is subjected to average 
quantum to produce dimensionality reduced 
tweets. Let us first formulate the input vector as 
given below. 


V= 


UyT, UyT2 .  UyTa 
UsTy. UsTs aa. “UaT x (1) 
Ug Ti Uggla: xs Us Te 


From the above equation (1), the input 
vector ‘IV’ is formulated based on the ‘m’ users 
and ‘n’ tweets. The Neumann Mutual Information 
between User-Tweet pair ‘ (U,T) ’ for a 
probability distribution of two users ‘U,’, ‘Uj’ is 
mathematically stated as given below. 


Prob (U;) = Xu, Prob (Uj, Uj) 
(2) 
Prob (U;) = Lu, Prob (U;, Uj) 


(3) 


Let us consider a quantum system that 
can be split into two portions, ‘U;’ and ‘U;’, with 
the purpose of making independent measurements 
to be made on either portion. Then, the state space 
of the entire quantum is formulated as given 
below. 


Neumann Mutual Information 


Dimensionality reduced tweets 


Ayu; = Hy, @Hy, (4) 


From the above state space of the entire 
quantum, let us consider ‘p”‘¥i’ the user density 
matrix, then the corresponding operation for user 
‘U;? density matrix is represented as given below. 


pii= Try poi (5) 


In a similar manner, the corresponding 
operation for user ‘ p¥i4i density matrix is 
mathematically represented as given below. 


pli =Try,plui (6) 


Finally, the average quantum mutual 
information with dimensionality reduced tweets 
in the corresponding state space is mathematically 
formulated as given below. 


DRT = SS(p¥i||p4% @ pi) (7) 


As given in the above equation (7), for 
every edge, average quantum mutual information 
is measured, and if found to be smaller than 
threshold, then the edge is discarded and on 
contrary, the edge is retained for further 
processing. The virtual code representation of 
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Neumann Mutual Information-based Feature 
selection is given below. 


Input: Dataset ‘DS’, Features ‘F = {F,,F.,...,F,}’, entity ‘E = {e,,e2,...,€m}’, relationship set ‘R = 
{r,, 1, wey Th} Tweets a i = {T,, To, mplth 


Output: Computationally-efficient dimensionality reduced tweets 


Step 1: Initialize ‘m’, ‘n’, ‘k’ 

Step 2: Begin 

Step 3: For each Dataset ‘DS’ with Features ‘F’ 

Step 4: Formulate the input vector as give in equation (1) 

Step 5: For each User-Tweet pair ‘(U, T)’ 

Step 6: Evaluate probability distribution of two users ‘U,’, ‘U;’ as given in equations (2) and (3) 
Step 7: Model state space of the entire quantum as given in equation (4) 

Step 8: Evaluate user ‘U;’ density matrix as given in equation (5) 

Step 9: Evaluate user ‘U;’ density matrix as given in equation (6) 

Step 10: Return dimensionality reduced tweets ‘DRT’ as given in equation (7) 
Step 11: End for 

Step 12: End for 

Step 13: End 


Algorithm 1: Neumann Mutual Information-based Feature selection 


As outlined in the algorithm immediately above, 
given the dataset and features as input, the initial 
step involves formulating the input vector based 
on the Knowledge Sentimental Graph. Second, 
for each User-Tweet pair, probability distribution 
of two user’s tweets separately is obtained. Next, 
state space of the entire quantum with respect to 
User-Tweet pair is formulated with which the 
density matrix is obtained separately for each 
user. Finally, average quantum mutual 
information is evaluated to obtained 
dimensionality reduced tweets. 


1.4 Stacked Bilateral LSTM-Based 
Sentiment Analysis Using 
Dimensionality Reduced Tweets 


In this section, we introduce a sentiment 
analyzer for Twitter sentiment classification in 
social media posts based on Stacked Bilateral 
Long Short-Term Memory (LSTM). The LSTM 
architecture enables the network to capture long- 
term relationships through the use of forget and 
remember gates, allowing the cell to decide 
whether to retain or discard tweets based on their 
strength and relevance. The Stacked Bilateral in 
our work represents the analysis of sentiments in 
both the forward and backward direction for 
robust representation and are then stacked, 
therefore producing the ffinal output for 
classification. Figure 3 shows the structure of 
Stacked Bilateral LSTM-based Sentiment 
analysis model. 
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BiLSTM 
Si 


(DRT, U-1) 


Figure 3: Stacked Bilateral LSTM cell gates-based sentimental analysis 


As shown in the above figure, with the social 
media twitter post as discussed above is initially 
tokenized according to Knowledge Sentimental 
Graph and then dimensionality reduced tweets are 
generated. With the generated tweets are then fed 
as input into the embedding layer that translates 
the tweet tokenish into the crypto tweet 
implanting. The LSTM is ultimately trained by 
utilizing the sequence of crypto tweets as input. A 
fully connected layer is employed to process the 
output of the LSTM, and it is activated with the 
sigmoid function to produce the final output 
estimates. The labels of the posts used for training 
were categorized and encoded as negative ('0'), 
neutral ('2'), and positive ('4') respectively. 

Let us consider the new contribution 
acknowledged by the neuron at time instance ‘t’ 
be denoted as ‘DRT,’ Then, the input ‘DRT,’ is 
the dimensionality reduced data agreed through 
the input gate via hyperbolic tangent activation 
role. At this time instance ‘t’, the neuron 
encompasses both the long-term memory ‘LT’ 
(1.e., ‘LT,_,’) and the operating memory ‘OM’ 
(i.e., ‘OM;,_,’) from prior time instance ‘t’. On 
one hand the dimensionality reduced tweets in the 


long termmemory refers to the tweet instances 
that are used for the whole training process and on 
the other hand, the dimensionality reduced tweets 
in the operating memory refers to the tweet 
instances that are utilized in a preferential manner. 
With these ‘DRT,’, ‘LT,_,’ and ‘OM,_,’, forget 
gate is constructed that establishes the tweets to 
be retained and tweets to be discarded. The forget 
gate is mathematically stated as given below. 


F, = o(WpDRT, + UpOM;-1) (8) 
From the above equation (8), forget gate 
at time instance ‘t’, ‘F,’ is evaluated based on the 
sigmoid activation function ‘ao’, weights ‘W,’ 
analogous to input dimensionality reduced tweets 
‘DRT,’ and weights ‘U;’ analogous to operating 
memory ‘OM;,,’ respectively. Next, the neuron 
evaluates essential tweets ‘LT’’ from ‘DRT,’, and 
is mathematically obtained as given below. 
LT' = g(W,DRT, + U,pOM,-1) (9) 
From the above equation (9), ‘g’ denotes 
the hyperbolic tangent initiation function. In 
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precedence with the inclusion of ‘LT'’, to the 
memory cell, the neuron estimates the tweets that Ap = o(W,DRT, + U,OMz-1) 
are useful for saving ‘S’ in the memory cell. The (13) 


. s . ‘ OM, = A;° LT 
neuron here while saving the estimated tweets in (14) r= Are (Ltr) 


the memory cell is read only in one forward 
direction. In our work, both forward direction and 
backward direction are taken into considered and 
the outputs are stacked together. This is 
mathematically expressed as given below. 


From the above equations (13) and (14), 
the anchor point ‘A,’ is a tertiary vector that will 
regulate the significance of tweets. The neuron 
here by applying hyperbolic tangent on long-term 
memory accomplishes element-wise 
multiplication with the ‘A,’ for updating tweets in 
operation memory ‘ OM, ’. Upon successful 
updates performed in both the long-term memory 
and operation memory at instance ‘t’, the neuron 
(i.e., user) will identify what tweets to be 
outputted to other neurons to enhance learning 
process. The complete procedure is said to be 

LT, = Fy 0 Lay + SPUS™ o LT, iterated until no more input data is present in the 
(12) neuron. The pseudo code illustration of Stacked 

Bilateral LSTM-based Sentiment analysis is 
, given below. 


S_ = 0(WsDRT; + UsOMz-1) (10) 
SBiLSTM — og eae @ Spackwara el 1) 


With the above resultant tweets saved in 
the memory as evaluated in (10) and (11), the 
tweets in the long-term memory are updated as 
given below. 


From the above equation (12), ‘ o 
denotes the element-wise growth applied for the 
tweets presents in the long-term memory for 
sentiment analysis. Finally, the operation memory 
‘OM’ at instance ‘t’ is updated as given below. 
Input: Dataset ‘DS’, Features ‘F = {F,, Fy,...,F,}’, entity ‘E = {e,,e2,...,€m}’, relationship set ‘R = 
{11,12}, Tweets ‘T = {T,,T, ..., Tr}? 
Output: Robust sentiment analysis 
Step 1: Initialize dimensionality reduced tweets ‘DRT’ 
Step 2: Initialize ‘m’, ‘n’, ‘k’ 
Step 3: Begin 
Step 4: For each Dataset ‘DS’ with Features ‘F’ and dimensionality reduced tweets ‘DRT’ 
Step 5: Formulate forget gate as given in equation (8) 
Step 6; Evaluate essential tweets using hyperbolic tangent activation function as given in equation (9) 
Step 7: Evaluate tweets that are useful for saving in the memory cell both in the forward and backward 
direction as given in equations (10) and (11) 
Step 8: Update tweets in long term memory as given in equation (12) 
Step 9: Evaluate the anchor point for analyzing the polarity as given in equations (13) and (14) 
Step 10: If ‘A; = 0’ 
Step 11: Then tweet polarity is negative 
Step 12: Return polarity result 
Step 13: End if 
Step 14: If ‘A, = 2’ 
Step 15: Then tweet polarity is neutral 
Step 16: Return polarity result 
Step 17: End if 
Step 18: If ‘A; = 4’ 
Step 19: Then tweet polarity is positive 
Step 20: Return polarity result 
Step 21: End if 
Step 22: End for 
Step 23: End 


Algorithm 2: Stacked Bilateral LSTM-based Sentiment analysis 
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As outlined in the algorithm above, the goal is to 
enhance the ratio of accurately predicted positive 
observations and improve the proportion of 
pertinent tweets acquired for sentiment analysis. 
To achieve this, an LSTM-based neural network 
with a Stacked Bilateral mechanism is devised. 
Through the application of the Stacked Bilateral 
mechanism, the input-dimensionality-reduced 
tweet undergoes processing in the neural network 
from the start to the end and then iteratively from 
the end to the start. This approach accelerates the 
learning and analysis of sentiment, leading to an 
improvement in the network's accuracy in correct 
sentiment analysis and a reduction in incorrect 
sentiment analyses. 


4. EXPERIMENTAL SETUP 


Within the realm of extensive sentiment 
data analysis, the openly accessible Sentiment140 
dataset is employed. This dataset encompasses 
1,600,000 tweets extracted using the Twitter API. 
The tweets are categorized with annotations such 
as '0 = negative,' '2 = neutral,' and '4 = positive,' 
making them suitable for sentiment detection. 
Refer to Table 2 for specific details regarding the 
dataset. 


Table 2: Details of Sentiment140 dataset 


4 | rag | Query 
: [ f User that tweetec 


a 


With the aid of the above features, four 
performance metrics, accuracy, time, precision 


and recall rate are exploited to measure the 
presentation of our proposed method, Neumann 
Mutual Informative and Stacked Bilateral Deep 
Learning (NMI-SBDL) for sentiment analysis 
comparing with conventional methods, Robustly 
optimized Bidirectional Encoder Representations 
from Transformers BERT approach (RoBERTa) 
and Long Short-Term Memory (LSTM) 
(RoBERTa-LSTM) [1], SenDemonNet [2] and 
other state-of-the-art sentiment analysis methods 
CNN-LSTM[3] and CNN-BiLSTM [4]. The 
procedures were implemented using Python, a 
high-level, general-purpose = programming 
language. 


5. DISCUSSION 
5.1 Performance Analysis of Sentiment 
Analysis Accuracy 


The primary parameter required for 
sentiment analysis using twitter is the accuracy 
rate. In other words, sentiment analysis accuracy 
states to the statistical measure of how well a 
twitter training sentiment analysis test correctly 
identifies or excludes a condition. 


SAgce = XL Tea te» 100 (15) 


Derived from the equation (15), the 
accuracy of sentiment analysis, denoted as 
'SA_acc,' represents the percentage ratio of 
sample tweets accurately assessed or classified as 
'T_AC' to the total sample tweets involved in the 
simulation process denoted as 'T_i.' It is measured 
in terms of percentage (%). Table 3 summarizes 
the sentiment analysis accuracy involved in the 
sentiment analysis using two conventional 
methods, RoBERTa-LSTM [1], SenDemonNet 
[2], state-of-the-art methods, CNN-LSTM [3 ]and 
CNN-BiLSTM [4] respectively. 


Table 3: Comparative analysis of sentiment analysis accuracy using Proposed NMI-SBDL, RoBERTa- 
LSTM [1], SenDemonNet [2], CNN-LSTM [3] and CNN-BiLSTM [4] 


Number of SELIG anal sis accurac 
Tweets D 


SE SS AT A TTS 
| | 

3000 96. 85 SAE [CIS PT SR SESE 

| | | | 

LS LY TS ST ET SRT NET 


| | | | 

ee SS Se 
| | 

9000 SS Se SS 

| | | 
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Figure 4: Graphical representation of sentiment analysis accuracy 


Figure 4 portrays the relative performance of 
twitter sentiment studyof two conventional 
methods, RoBERTa-LSTM [1], SenDemonNet 
[2], state-of-the-art methods, CNN-LSTM [3]and 
CNN-BiLSTM [4] respectively in relations of 
accuracy. From the figure it is contingent that the 
sentiment analysis accuracy in the y-axis and 
number of tweets in the x-axis are both found to 
be in reverse proportional with each other. To be 
more specific, growing the number of tweets 
causes decreases in the sentiment analysis 
accuracy and vice versa. But simulations 
accomplished with 1000 tweet samples the 
sentiments accurately evaluated using NMI- 
SBDL method was found to be 97.5%, 94.5% 
when used with [1], 93% when used with [2], 
92.7% when used with [3] and 91.5% when used 
with [4]. From this result, the sentiment analysis 
accuracy was found to be improved using NMI- 
SBDL method when compared to [1], [2], [3] and 
[4]. The reason behindhand the development was 
due to the employment of Knowledge Sentimental 
Graph for each User-Tweet pair that evaluates 
space of the entire quantum. Finally, by 
employing average quantum mutual information 
dimensionality reduced tweets were obtained with 
which the classification is made for further 
processing. This in turn progresses the sample 
tweets accurately being estimated using NMI- 
SBDL method. As a result, the sentiment analysis 
accuracy using NMI-SBDL method was said to be 


improved by conventional methods by 5% in 
comparison to [1] and 10% in comparison to [2], 
improved by state-of-the-art methods by 15% in 
comparison to [3] and 20% in comparison to [4] 
correspondingly. 


5.2 Performance Analysis of Sentiment 
Analysis Time 


The evaluation of tweet polarity 
extracted through the tweet API is reported to 
require a short amount of time. In other words, 
sentiment analysis time refers to the time used up 
in acquiring the user’s tweets and analyzing the 
same based on the tweet polarity. This is 
mathematically formulated as given below. 


SAtime = Lia Ti * Time [Ae] (16) 


From the above equation (16), sentiment 
analysis spell ‘SA;ijme’ 1s estimated based on the 
user tweet samples intricate for analyzing 
sentiments ‘T;’ and the time consumed in arriving 
at the anchor point ‘A,’ is a tertiary vector that 
will determine the significance of tweets 
‘Time [A,]’. This is restrained in terms of 
milliseconds (ms). Table 4 recaps the sentiment 
analysis time tangled in the sentiment analysis 
using two conventional methods, RoBERTa- 
LSTM [1], SenDemonNet [2], state-of-the-art 
methods, CNN-LSTM [3]and CNN-BiLSTM [4] 
respectively. 
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Table 4: Comparative analysis of sentiment analysis time using Proposed NMI-SBDL, RoBERTa-LSTM 
[1], SenDemonNet [2], CNN-LSTM [3] and CNN-BiLSTM [4] 


Number of 
Tweets NMI 


1000 


Sentiment analysis time (ms 


Sentimen analysis time (ms) 


Number of tweets 


Figure 5: Graphical representation of sentiment analysis time 


Figure 5 illustrates the comparative 
efficiency of Twitter sentiment analysis among 
two conventional approaches, ROBERTa-LSTM 
[1] and SenDemonNet [2], as well as two state-of- 
the-art methods, CNN-LSTM [3] and CNN- 
BiLSTM [4], with respect to sentiment analysis 
time. The y-axis represents sentiment analysis 
time, and the x-axis displays the number of 
tweets. The figure indicates a _ direct 
proportionality between the number of tweets and 
the performance of sentiment analysis. 
Specifically, an increase in the number of tweets 
correlates with improved sentiment analysis 
performance for Twitter, consequently leading to 


an increase in sentiment analysis time. In 
simulations conducted with 1000 tweets, the time 
required for analyzing Twitter sentiment using the 
NMI-SBDL method was observed to be 250 ms, 
320 ms for [1], 370 ms for [2], 440 ms for [3], and 
500 ms for [4]. From this result, the sentiment 
analysis time was observed to be comparatively 
lesser using [1], [2], [3] and [4]. The enhancement 
can be attributed to the inclusion of Bayes Linear 
Regression before the actual feature selection 
process. With this type of regression, the 
proposed method utilizes a Neumann Mutual 
Information between User-Tweet pair for a 
probability distribution of two users for 
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estimating the independent measurements. As a 
result, the sentiment analysis time for capturing 
user tweets and detecting sentiment using the 
NMI-SBDL method is decreased by 9% when 
compared to [1], 21% when compared to [2], 29% 
when compared to [3], and 34% when compared 
to [4], respectively. 


5.3. Performance Analysis of Precision 


The precision rate is defined as the ratio 
of relevant instances (i.e., analyzed relevant 
tweets) to retrieved instances (i.e., retrieved 
tweets) in sentiment detection. Mathematically, 
this is expressed as follows: 


PS a (17) 

From the above equation (17), the 
precision rate ‘P’ is evaluated on the basis of the 
true positive instances ‘TP’ (i.e., positive tweets 
detected as positive tweets) and the false positive 
instances ‘FP’ (i.e., positive tweets detected as 
negative tweets) respectively. Table 5 
summarizes the prediction rate involved in the 
sentiment analysis for the sample tweets using 
two conventional methods, ROBERTa-LSTM [1], 
SenDemonNet [2], state-of-the-art methods, 
CNN-LSTM  [3]Jand CNN-BiLSTM [4] 
respectively. 


Table 5: Comparative analysis of precision using proposed NMI-SBDL, RoBERTa-LSTM [1], SenDemonNet [2], CNN- 
LSTM [3] and CNN-BiLSTM [4] 


Precision 


< 
i=) 
= 
nan 
— 
o 
o 
fm 
i 


Number of tweets 


Figure 6 :Graphical representation of precision 
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Figure 6 given above illustrates the graphical 
representation of precision for 10000 distinct 
sample tweets to detect sentiment. From the above 
figure it is inferred that the precision rate was 
found to be comparatively higher using NMI- 
SBDL method upon comparison with two 
conventional methods [1], [2], and state-of-the-art 
methods [3] and [4]. The increase in the precision 
rate using NMI-SBDL method was lesser than [1], 
[2], [3] and [4]. The reason behind the 
minimization of false prediction results was due 
to the application of Stacked Bilateral LSTM cell 
for sentiment analysis. By applying | this 
mechanism, traversal for sentiment analysis using 
the tweets were made both in the forward 
direction and backward direction. Therefore, for 
classification both the long short term memory 
results in addition to operation memory results 
were utilized for analysis purpose. As a result, the 
precision rate using NMI-SBDL method was 
found to be comparatively better by the 
conventional methods, [1] by 3% and 5% when 


compared to [2] and better by state-of-the-art 
methods by 7% compared to [3] and 9% 
compared to [4] respectively. 


5.4 Performance Analysis of Recall 


Finally, recall is measured that refers to 
the ratio of relevant instances (i.e., relevant tweet 
detection) that were retrieved. 

“iy (18) 

From the above equation (18), recall rate 
“R’ is evaluated based on the true positive 
instances ‘TP’ and the false negative instances 
‘ FN ’ respectively. Table 6 displays the 
corresponding results of recall rate using two 
conventional methods, RoBERTa-LSTM [1], 
SenDemonNet [2], state-of-the-art methods, 
CNN-LSTM = [3Jand CNN-BiLSTM [4] 
respectively. 


Table 6 : Comparative analysis of recall using proposed NMI-SBDL, RoBERTa-LSTM [1], SenDemonNet [2], CNN- 
LSTM [3] and CNN-BiLSTM [4] 


Number of 
Tweets L | 


Recall 


1000 a a a 


3000 a 


0.83 0.81 


Number of tweets 


—— NMI-SBDL 


Figure 7: Graphical representation of recall 
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Figure 7 given above illustrates the recall rate for 
different numbers of tweets ranging between 1000 
and 10000 employing the six different fields 
acquired from table 2 separately. Also, a linear 
trend is observed 1.e., increasing the numbers of 
sample tweets causes an increase in the false 
negative rate. This is due to the reason that by 
increasing the sample tweets size, false negative 
rate increases and on the other hand, true positive 
rate increases. However, the false negative rate 
using NMI-SBDL method was found to be 
comparatively lesser than the two conventional 
methods, [1] and [2] and two state-of-the-art 
methods, [3] and [4] respectively. The reason 
behind the minimization of false negative rate was 
owing to the Stacked Bilateral LSTM-based 
Sentiment analysis algorithm for sentiment 
analysis based on the tweets obtained. With this 
algorithm, an integration of LSTM-based neural 
network and Stacked Bilateral mechanism is 
performed. Moreover, by applying Stacked 
Bilateral mechanism dimensionality reduced 
tweet is fed via neural network from beginning to 


end and then from end to beginning, therefore 
analyzing the sentiment faster. Due to this 
accurate sentiment analysis made by the network 
was improved and on the other hand, wrong 
sentiment analysis made by the network was 
reduced concurrently. This in turn improves the 
recall rate using NMI-SBDL method by 3% 
compared to [1], 5% compared to [2] and 7% 
compared to [3] and 9% compared to [4] 
respectively. 


5.5 Comparison With State-Of-The-Art 
Methods 
The study results are discussed with 
proposed NMI-SBDL and _ state-of-the-art 
methods such as RoBERTa-LSTM [1], 
SenDemonNet [2], CNN-LSTM[3], and CNN- 
BiLSTM [4] using Sentiment140 dataset based on 
various parameters, such as sentiment analysis 
accuracy, sentiment analysis time, precision and 
recall. Table 7 provides a detailed comparison of 
the proposed and state-of-the-art methods. 


Table 7: Comparative analysis of proposed with state-of-the-art methods 


Parameter NMI-SBDL 
[1] 
Sentiment analysis O51) 90.25 


accuracy (% 


RoBERTa-LSTM 


CNN-LSTM CNN- 
[3] BiLSTM [4] 
86.66 83.18 1B 


SenDemonNet [2] 


time (ms 


Table 7 shows the comparative results of 
sentiment analysis accuracy, sentiment analysis 
time, precision, and recall for proposed NMI- 
SBDL and _ state-of-the-art methods such as 
RoBERTa-LSTM [1], SenDemonNet [2], CNN- 
LSTM [3] and CNN-BiLSTM [4]. By observing 
the above table, the results of sentiment analysis 
accuracy, precision, and recall using the proposed 
NMI-SBDL are highly increased than the other 
existing methods. The reason for the higher 
sentiment analysis accuracy is to apply the 
Stacked Bilateral LSTM-based Sentiment 
analysis algorithm to categorize the tweet 
polarity. Then the activation function returns the 
classification results and minimizes the incorrect 
false negative rate. In this way, tweets are 
correctly classified hence it improves accuracy, 
precision, and recall. Also, the sentiment analysis 
time of the proposed NMI-SBDL is greatly 


reduced by 429.5 ms than the other works. The 
sentiment analysis accuracy, precision, and recall 
of NMI-SBDL are achieved as 95.12%, 0.91, and 
0.89 which is observed as the highest value than 
the other methods. 


6. CONCLUSION 


In this paper, a significant sentimental 
analysis method using Neumann Mutual 
Informative and Stacked Bilateral Deep Learning 
(NMI-SBDL) is proposed. Various stages in the 
design encompass’ feature selection and 
classification. Initially, tweets are collected from 
diverse users and fed as input to the Neumann 
Mutual Information-based Feature Selection 
algorithm. Secondly, computationally efficient 
dimensionality reduced tweets are obtained with 
the Neumann Mutual Information mechanism for 
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minimizing computation time. Subsequently, the 
tweets with reduced dimensionality are utilized as 
input, the tweets are classified by employing 
Stacked Bilateral LSTM-based Sentiment 
Analysis algorithm with maximum precision and 
recall. 


The comprehensive — experimental 
evaluation is implemented for NMI-SBDL and 
conventional method using Python and applied to 
the Sentiment140 dataset. The results confirm that 
the NMI-SBDL method yields superior outcomes 
in performance metrics, such as 23% sentiment 
analysis time, 13% accuracy, 6% precision, and 
6% recall, in comparison to both conventional and 
state-of-the-art methods. The proposed NMI- 
SBDL is shown to be effective in accurately 
identifying tweets, reducing computation time, 
and contributing to the overall improvement of 
sentiment analysis performance. In the future, 
need to develop the algorithm to acquire higher 
accuracy with the use of deep learning and 
optimization approaches. 
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